Resampling Methods in Software Quality Classification
نویسندگان
چکیده
In the presence of a number of algorithms for classi ̄cation and prediction in software engineering, there is a need to have a systematic way of assessing their performances. The performance assessment is typically done by some form of partitioning or resampling of the original data to alleviate biased estimation. For predictive and classi ̄cation studies in software engineering, there is a lack of a de ̄nitive advice on the most appropriate resampling method to use. This is seen as one of the contributing factors for not being able to draw general conclusions on what modeling technique or set of predictor variables are the most appropriate. Furthermore, the use of a variety of resampling methods make it impossible to perform any formal metaanalysis of the primary study results. Therefore, it is desirable to examine the in°uence of various resampling methods and to quantify possible di®erences.Objective and method: This study empirically compares ̄ve common resampling methods (hold-out validation, repeated random sub-sampling, 10-fold cross-validation, leave-one-out cross-validation and nonparametric bootstrapping) using 8 publicly available data sets with genetic programming (GP) and multiple linear regression (MLR) as software quality classi ̄cation approaches. Location of (PF, PD) pairs in the ROC (receiver operating characteristics) space and area under an ROC curve (AUC) are used as accuracy indicators. Results: The results show that in terms of the location of (PF, PD) pairs in the ROC space, bootstrapping results are in the preferred region for 3 of the 8 data sets for GP and for 4 of the 8 data sets for MLR. Based on the AUC measure, there are no signi ̄cant di®erences between the di®erent resampling methods using GP and MLR. Conclusion: There can be certain data set properties responsible for insigni ̄cant di®erences between the resampling methods based on AUC. These include imbalanced data sets, insigni ̄cant predictor variables and high-dimensional data sets. With the current selection of data sets and classi ̄cation techniques, bootstrapping is a preferred method based on the location of (PF, PD) pair data in the ROC space. Hold-out validation is not a good choice for comparatively smaller data sets, where leave-one-out cross-validation (LOOCV) performs better. For comparatively larger data sets, 10-fold cross-validation performs better than LOOCV.
منابع مشابه
روشهای بازنمونهگیری بوت استرپ و جک نایف در تحلیل بقای بیماران مبتلا به تالاسمی ماژور
Background and Objectives: A small sample size can influence the results of statistical analysis. A reduction in the sample size may happen due to different reasons, such as loss of information, i.e. existing missing value in some variables. This study aimed to apply bootstrap and jackknife resampling methods in survival analysis of thalassemia major patients. Methods: In this historical coh...
متن کاملHigh-quality multi-pass image resampling
This paper develops a family of multi-pass image resampling algorithms that use one-dimensional filtering stages to achieve high-quality results at low computational cost. Our key insight is to perform a frequency-domain analysis to ensure that very little aliasing occurs at each stage in the multi-pass transform and to insert additional stages where necessary to ensure this. Using one-dimensio...
متن کاملValidation and Verification
This chapter discusses important aspects of the validation and verification of neural network models including selection of appropriate error metrics, analysis of residual errors and resampling methodologies for validation under conditions of sparse data. Error metrics reviewed include mean absolute error, root mean squared error, percent good and absolute distance error. The importance of inte...
متن کاملRoc Analysis in Machine Learning Program Committee Organising Committee Table of Contents Resampling Methods for the Area under the Roc Curve
Receiver Operating Characteristic (ROC) analysis is a common tool for assessing the performance of various classification tools including biological markers, diagnostic tests, technologies or practices and statistical models. ROC analysis gained popularity in many fields including diagnostic medicine, quality control, human perception studies and machine learning. The area under the ROC curve (...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- International Journal of Software Engineering and Knowledge Engineering
دوره 22 شماره
صفحات -
تاریخ انتشار 2012